Scientific Data
○ Springer Science and Business Media LLC
Preprints posted in the last 90 days, ranked by how well they match Scientific Data's content profile, based on 174 papers previously published here. The average preprint has a 0.11% match score for this journal, so anything above that is already an above-average fit.
Poliva, O.
Show abstract
The neuroscience literature contains thousands of studies localizing cognitive, sensory, and motor functions to specific brain regions, yet this knowledge remains fragmented across experimental modalities, naming conventions, and spatial reference systems. Consequently, relating reported activations, lesions, or stimulation sites to the broader functional literature often requires substantial manual synthesis. The Brain Encyclopedia Atlas Project (BEAP) was developed to address this challenge by providing a spatially grounded framework for organizing literature-defined brain regions. BEAP is an expert-curated neuroinformatics resource that aggregates and spatially indexes literature-defined cortical and subcortical functional regions within a common anatomical reference framework. The project identifies 108 neocortical fields and 18 cerebellar fields defined through an analysis of published figures from 1,453 human studies using functional neuroimaging, intracranial electrophysiology, and cortical stimulation. These regions were manually aligned to standard anatomical templates and associated with parcels of the Human Connectome Project multimodal parcellation (MMP1). Inclusion criteria required convergent functional evidence, lesion support, and boundary-related contrasts. Additionally, 340 allocortical, diencephalic, cerebellar, and brain stem nuclei were delineated through comparison with histological atlases and research articles. The resource is publicly accessible at https://brainatlas.online/3d-brain/, featuring an interactive three-dimensional brain model that interfaces directly with a curated encyclopedia. This platform provides structured entries synthesizing regional functional descriptions, boundary-defining evidence, internal organization, and connectivity annotations. Furthermore, each entry is designed to evolve through community feedback via a dedicated comment section. By providing a unified spatial context at the whole-cortex scale, BEAP enables systematic comparison across studies and facilitates the identification of recurring patterns in cortical organization. It serves as an integrative resource for research and education, supporting the contextualization of neuroimaging findings and the generation of hypotheses regarding large-scale brain organization.
Niittynen, P.; Kemppinen, J.
Show abstract
We present here FennoTraits, which is a dataset of plant functional trait and community composition data which we collected from Fennoscandia across northern Finland, Norway, and Sweden in 2016-2025. This dataset has 42 049 abundance estimations and 155 794 functional trait observations from 10 traits representing 373 vascular plant species collected from 1 235 study sites within seven study areas. The trait measurements consist of size-structural, leaf economic, leaf spectral, and reproductive traits. The species represent the majority of the native vascular plant species that occur at the seven study areas, and many of the species occur in all seven areas across the two biomes and their ecotone: tundra and boreal forests. Each study area has distinct characteristics and a range of habitats: tundra, meadows, wetlands, shrublands, and boreal forests. These areas are under low anthropogenic influence, and many of the sites are within protected areas that are reserved for nature conservation and scientific research. Finally, we provide with this dataset a general description of the main trait patterns and profiles of the northern European flora.
Milne, L.; Simpson, C. G.; Guo, W.; Mayer, C.-D.; Milne, I.; Bayer, M.
Show abstract
We describe a major new release of the EoRNA database, a gene expression database for barley based on public data, first published in 20211. EoRNA v.2 (https://ics.hutton.ac.uk/eorna2/index.html) features an order of magnitude more samples and is based on a new automated workflow of sample discovery and processing which has enabled a dramatic scale-up the original database. EoRNA v.2 also features a major rebuild of the web user interface with rich new functionality. All infrastructure-related code and database schemas and web components are now species agnostic and publicly available for reuse with other taxa. A dedicated new reference transcript dataset has been created for EoRNA v.2 which is largely based on the recently published barley pan-transcriptome and represents the most comprehensive dataset of its kind to date.
Tan, G. Z. H.; Urano, D.
Show abstract
Hyperspectral imaging is an imaging technique that allows for acquisition of high-resolution spectral information beyond that of the visible spectrum. When applied to plants, it effectively enables non-invasive characterization of physiological status and has been widely used in agricultural settings. Marchantia is a model bryophyte species whose flat morphology and visually distinct stress-response phenotypes makes it an ideal candidate for imaging studies. Here, we provide a comprehensive protocol for hyperspectral imaging for Marchantia plants, which encompasses hardware configuration, data acquisition, and computations processing. This protocol features a streamlined data processing pipeline hosted on a web-based development platform that automates 1) the segmentation of plant area into spatially distinct regions for localized analysis of intra-specimen physiological gradients, and 2) classification of plant pixels based on their spectral signatures. All results are exported as structured CSV files for ease of further analysis as desired by the user.
Bhagwat, N.; Wang, M.; Dugre, M.; Pfarr, J.-K.; Dai, A.; Urchs, S.; McPherson, B.; Gau, R.; van Heese, E. M.; d'Angremont, E.; Laansma, M. A.; Prasad, S.; Sanz-Robinson, J.; Torabi, M.; Jahanpour, A.; Danyluik, M.; Joubert, A.; Macdonald, A.; Waller, L.; Stewart, A.; Joulot, M.; Dickie, E.; Devenyi, G. A.; Bouix, S.; Bollmann, S.; Jahanshad, N.; Thompson, P. M.; Burgos, N.; Chakravarty, M. M.; Halchenko, Y. O.; van der Werf, Y. D.; Poline, J.-B.
Show abstract
Neuroimaging data management and processing are tedious and error-prone, prompting reproducibility concerns. Globally, studies with heterogeneous infrastructure and governance policies lead to eclectic data processing and sharing, necessitating standardization of data workflows to ensure reusability and comparability of multi-centric datasets. The Nipoppy neuroinformatics framework facilitates such standardization by combining specification, protocol, and software to manage study-level data workflows. With its adoption, researchers can share standardized, derived datasets enabling efficient, reproducible, and inclusive research.
Ortega-Solis, G. R.; Stastny, K.; Bejcek, V.; Telensky, T.; Mellado-Mansilla, D.; Zarybnicky, J.; Grattarola, F.; Zarybnicka, M.; Vermouzek, Z.; Vorisek, P.; Leroy, F.; Tietje, M.; Soria, C. D.; Padulosi, E.; Travnickova, E.; Wolke, F. J. R.; Keil, P.
Show abstract
MotivationHigh-quality biodiversity data with temporal replicates, produced using standardized fieldwork protocols, are rare yet essential for studying long-term biodiversity dynamics. Most available large-scale temporal data only date back one or two decades and/or originate from spatially discrete local observations. Here, we release spatially contiguous, systematically collected, and gridded occurrence data for breeding birds in Czechia, covering the periods 1973--1977, 1985--1989, 2001--2003, and 2014--2017. This database represents the monitoring of ca. 41% of European bird species over 40 years, and it is one of the longest-running nationwide bird-monitoring efforts in the world. We also complement the original data with geospatial metrics to characterize the sampling polygons and provide proxies of sampling effort. By making this dataset openly accessible, we aim to strengthen biodiversity change studies, citizen science, and ornithological research with long-term, highly curated records, backed by well-documented methods, and ready for integration with other datasets. Main Types of Variables ContainedA total of 286302 breeding bird detections/non-detections per-grid-cell from 247 species (ca. 41% of the 596 species breeding in Europe). The fourth atlas also contains 9,471 timed species lists totaling 276076 additional records collected with standardized effort and partially random spatial sampling on smaller squares dividing the original grid cells. Spatial Location and GrainCzechia (total area of 78,871 km2) covered by a grid of 887 grid cells of 10 by 10 km for the period 1973--77, and 678 cells of 6 minutes latitude and 10 minutes longitude ([~]11.2 x 12 kilometers) from 1985 onwards. The timed species lists were collected across 4,851 of 9,844 small squares ([~]2.8 x 3 km) that subdivide each original grid-cell into 16 smaller polygons. Time Period and GrainThe sampling years were 1973--1977 (5 breeding seasons), 1985--1989 (5 breeding seasons), 2001--2003 (3 breeding seasons), and 2014--2017 (4 breeding seasons). Major Taxa and Level of MeasurementBirds (Aves). The breeding evidence per species and grid cell was classified following the European Breeding Birds Atlas 2. We provide species-level records matched to the HBW/BirdLife version 9 (2024). FormatThe dataset is available for download from Zenodo and is provided as CSV files with fields standardized to Darwin Core, and a GeoPackage file containing all of the spatial grids used. The data are organized into separate files for records and sampling events, corresponding to each atlas. All data are licensed under CC-BY 4.0.
Uiterwaal, S. F.; La Sorte, F. A.; Coblentz, K. E.; DeLong, J. P.
Show abstract
MotivationThe diet composition of a predator is a direct reflection of its role in a food web, resulting from interactions with prey species. Raptors (including hawks, owls, and falcons) are ubiquitous predators with diverse diets, yet there is no comprehensive database of raptor diet composition. We present a database of over 3500 raw raptor diet records, compiled from more than 1000 studies and representing 173 raptor species from across the world. Our dataset complements existing qualitative summaries of species diets by compiling thousands of quantitative diet "samples" over time and space to present diet data at a uniquely fine resolution. Main types of variable containedThe database comprises published records of raptor diets from pellets, prey remains, direct or photographic observations, prey DNA, and raptor gut or gullet contents. For each diet, we present the taxonomic identity and amounts of consumed prey. We additionally present various metadata for each diet such as location, habitat, and season. Spatial location and grainThe study incorporates diet records collected worldwide, with each record assigned geographic coordinates corresponding to the location where the diet information was obtained. Time period and grainThe database includes diet records from 1893 to 2025. We report a year for each diet record. Major taxa and level of measurementWe recorded raptor diet at the species level, including raptors from three orders: Strigiformes, Falconiformes and Accipitriformes excluding vultures. Most prey are identified to species, but prey taxonomic level varies depending on the extent to which they could be identified. Software formatDiet records and metadata are provided in two files with comma-separated value (.csv) format.
Wolters, F. C.; Woldu Semere, T.; Schranz, M. E.; Medema, M. H.; Bouwmeester, K.; van der Hooft, J. J. J.
Show abstract
Plants produce the most diverse blends of specialized metabolites on earth. Natural products derived from plants are valuable resources for drug development, food chemistry, and crop resistance breeding. Phenotypes of specialized metabolite profiles can be captured by untargeted mass-spectrometry across species phylogeny, tissues, and genotypes. Here, we collected metabolic fingerprints of 17 Brassicaceae species across three tissues (paired leaf and root; flower) using liquid chromatography-tandem mass spectrometry (LC-MS/MS) in positive and negative ionization mode. Corresponding metadata has been refined for reuse according to ReDU guidelines, and for integration with public genomic and transcriptomic data. Standardization of in vitro growth conditions, and data processing workflows enables integration of acquired raw and processed data across platforms for single- and multi-omics analysis. Further, the inclusion of tissue-specific metabolic profiles across ploidy levels, as well as across crop species and wild relatives, makes this dataset a valuable resource for natural product discovery.
Webster, J. M.; Shojaie, A.; Shen, Y. A.; Le, T.; Ragaglia, E.; Bogdani, M.; Kirkland, A.; Mac Donald, C.; Latimer, C. S.; Keene, C. D.; Grabowski, T. J.
Show abstract
Human brain tissue preserved in biorepositories is foundational for the structural, cellular, and biomolecular research necessary for a mechanistic understanding of neurological diseases. Realizing the research potential of these valuable resources requires well-characterized research-relevant tissue that can be efficiently identified by investigators and incorporated into the conceptual and computational frameworks of interdisciplinary research. Several large-scale efforts to improve research reliability and reproducibility have sought to characterize and annotate the processes by which these samples are collected, yet limited progress has been made on standardizing spatial information for these samples. Biorepositories systematically collect brain tissue according to a brain sampling protocol (BSP) that differs between institutions, yet explicit spatial information regarding the samples may not be documented in standard operating procedures (SOPs). The amount of anatomical location details available to investigators are inconsistent across biorepositories and typically lack sufficient anatomical precision to ensure correspondence with samples from other biorepositories or research relevant brain regions specified by neuroimaging, functional, or disease-susceptibility criteria. Here, we introduce a pipeline for developing a Spatial Atlas for Mapping Protocol Locations of Ex vivo Samples (SAMPLES), which uses a neuroimaging framework to create a 3D representation of a BSP through a metrically precise digital instantiation of the procedures for brain extraction, segmentation, slicing, and sampling on a modern digital brain template. SAMPLES incorporates modern neuroinformatics conventions to create explicit 3D labels of BSP-defined samples that can be interactively visualized with freely available neuroimaging software. We illustrate the pipeline by developing an atlas for the protocol from the University of Washington BioRepository and Integrated Neuropathology laboratory (UW BRaIN SAMPLES). By providing an explicit, computable reference, SAMPLES atlases can support the efficient identification, referencing, and utilization of postmortem samples for interdisciplinary research. These capabilities enable biorepository workflows, data harmonization across biorepositories, and integration with antemortem neuroimaging.
Barham, M. P.; Morrison-Ham, J.; Greenwood, C. J.; Bertazzoli, G.; Rogasch, N. C.; Bereznicki, H. G.; Younger, E. F.; Ellis, E. G.; Graeme, L. G.; Cunningham, D. A.; Liao, W.-Y.; Fried, P. J.; Pascual-Leone, A.; Enticott, P. G.; Corp, D. T.
Show abstract
Currently, there is no consensus about how investigators should format their NIBS data for sharing. This presents a barrier to the advancement of big data analyses because it requires time-consuming operations to generate consistent formats across different shared datasets. Recently, we launched Big non-invasive brain stimulation data (Big NIBS data), an open-access platform and repository for NIBS data (https://www.bignibsdata.com/), providing a structured mechanism for researchers to share NIBS data. However, the reusability and interoperability of data uploaded to Big NIBS data is restricted by the absence of a common data structure. The current paper addresses this problem by creating the NIBS data analysis structure (NIBS-DAS), a template pipeline for the layout, management, and analysis of collated NIBS outcome data. While its primary purpose is to provide a template layout for uploading collated data to the Big NIBS data repository, NIBS-DAS also offers guidelines for the management and analysis of collated NIBS data, thereby forming a data analysis pipeline that can be freely used by the NIBS field in general. We anticipate that NIBS-DAS will serve to facilitate data sharing on the Big NIBS data platform and promote greater standardisation of data management and analytical practices in the NIBS field.
Zhang, M.; Liu, P. R.; Su, H.; Zhao, M.; Li, X.; Born, S.; Lee, Y.; Honey, C.; Chen, J.; Lee, H.
Show abstract
Spontaneous thought is pervasive in everyday human cognition, yet datasets capturing its neural dynamics under minimally interrupted conditions remain limited. The current dataset was acquired from a think-aloud functional MRI experiment in which 118 participants continuously verbalized their spontaneous thoughts during 10-minute scanning sessions. The raw MRI data and verbal transcripts with sentence-level timestamps were previously released and analyzed in our prior study examining neural activity associated with thought transitions. Building on that release, we additionally provide preprocessed MRI data, speech transcriptions with word-level timestamps aligned to image acquisition, large language model-generated ratings of transcribed thoughts across emotional and sensory dimensions, and self-report survey measures assessing personality, mental health, and cognitive abilities. Validation analyses demonstrated activation in expected cortical regions associated with speech production and sensory content identified from transcript annotations, agreement between language model and human ratings, and adequate internal consistency of survey measures, supporting the datasets overall quality. This dataset enables reuse for investigations of spontaneous thought, speech generation, and individual differences using naturalistic functional MRI data.
Yamasaki, H.; Blache, P.; Schön, D.
Show abstract
Conversation is a fundamental human behaviour that requires rapid coordination between speaking, listening, and turn-taking, yet datasets capturing its neural dynamics in natural interaction remain scarce. Hyperscanning EEG is particularly valuable for this purpose because it records both interlocutors simultaneously, enabling the study of speaker-listener coupling, response timing, and dyadic coordination during live exchange. Here we present DUET (Dyadic Understanding, EEG and Turn-taking), a hyperscanning dataset for studying natural French face-to-face conversation. The dataset comprises recordings from 18 dyads, or 36 French-speaking adults, performing the Diapix collaborative spot-the-difference task across eight 4-minute face-to-face conversation blocks. For each participant, EEG was recorded from 36 participants; most recordings used 64-channel EEG, with one pilot dyad recorded using 32 electrodes. The public release includes raw EEG recordings, precomputed ICA decompositions for reuse in downstream preprocessing as well as various features derived from the audio and manually corrected transcripts.
Kambara, K.; Chen, Q.; Tsugama, D.
Show abstract
Grass Expression Atlas (GExA) is an interactive web-based resource for rapid exploration of gene expression across diverse tissues, developmental stages, and conditions in grass species. GExA integrates publicly available RNA sequencing (RNA-seq) datasets for four millets: pearl millet (Cenchrus americanus), foxtail millet (Setaria italica), proso millet (Panicum miliaceum), and finger millet (Eleusine coracana), and includes barley (Hordeum vulgare) and sorghum (Sorghum bicolor) as reference species. Datasets were processed using a unified processing workflow to generate expression values in transcripts per million (TPM). The current release comprises 4,673 samples from 442 BioProjects, including 987 pearl millet samples and 2,216 foxtail millet samples, and is provided through a user-friendly web interface. GExA is designed for scalable expansion to additional species via the pipeline used in this study. GExA is freely available at https://webpark2116.sakura.ne.jp/RNADB.
Madan, R.; Crane, P. K.; Gennari, J. H.; Latimer, C. S.; Choi, S.-E.; Grabowski, T. J.; Mac Donald, C. L.; Hunt, D.; Postupna, N.; Bajwa, T.; Webster, J.
Show abstract
1.Quantitative neuropathology has advanced through whole-slide imaging and digital histology platforms. Yet, these measurements rarely align with neuroimaging coordinate frameworks that may be useful for spatial modeling and other applications. QNPtoVox, short for quantitative neuropathology to voxels, is a reproducible, modular pipeline that transforms quantitative metrics generated by digital pathology software (HALO) into voxel-based maps registered to a standard common coordinate (MNI) template. The workflow integrates digital histopathology, gross tissue photography, ex-vivo MRI, and nonlinear registration to generate spatially standardized 3D pathology representations. This Methods article provides a complete procedural description, including required materials, step-wise instructions, operator-dependent checkpoints, expected outputs, reproducibility evaluation, and troubleshooting. QNPtoVox enables voxel-level integration of neuropathology with neuroimaging tools, unlocking existing histopathology datasets for computational modeling and cross-cohort harmonization.
Hole, D. T.; Abdalla, A.; Zubach, V.; Pratt, M.; Van Driel, S.; Ashfaq, S.; Hiebert, J.; Duggan, A. T.
Show abstract
Although vaccine-preventable, measles virus (MeV) continues to pose a significant public health challenge, with a substantial resurgence of cases worldwide. As whole-genome sequencing (WGS) becomes increasingly affordable and routinely adopted in public health laboratories, reliable and accessible analysis of next-generation sequencing (NGS) data is critical for outbreak investigation and molecular surveillance. Here, we present MeaSeq, a fast, user-friendly, open-source bioinformatics pipeline for MeV analysis using Illumina or Oxford Nanopore Technologies (ONT) NGS data. MeaSeq performs quality control assessments, consensus genome assembly and variant detection, optional genotype-specific reference selection, Distinct Sequence Identifier (DSId) assignment via user-provided databases or hashing, sub-consensus variant visualization, genome quality assessment, and standardized HTML reporting. We compared the performance of MeaSeq on NGS data generated from multiple sequencing platforms and targeted enrichment strategies against gold-standard Sanger data, reference genomes, and publicly available comparative data. This validation demonstrates that MeaSeq provides an accurate, reproducible, and accessible solution for routine MeV WGS analysis, supporting genomic surveillance and outbreak response workflows in public health and research settings. Impact StatementThe recent surge in measles cases worldwide, causing several countries to lose their measles elimination status, underscores the urgent need for effective and accessible genomic surveillance. Our manuscript introduces MeaSeq, a comprehensive and open-source bioinformatics pipeline specifically designed for analyzing MeV NGS data. MeaSeq includes MeV specific analyses such as genotype prediction from sequencing reads with optional genotype-specific reference selection; DSId assignment; quality control checks such as genome rule-of-six divisibility and gene CDS validation; subconsensus nucleotide analysis with mixed-site highlighting; and genomic plotting. By leveraging NGS technology, our pipeline can facilitate the identification of transmission chains and may provide critical insights into the dynamics of MeV outbreaks. This information is essential for public health officials and researchers to implement targeted interventions and optimize vaccine strategies. Additionally, the open-source nature of MeaSeq fosters collaboration and innovation within the scientific measles community along with providing access to a wider range of researchers. Data SummaryThe MeaSeq pipeline code is available on GitHub (https://github.com/phac-nml/measeq). Comparative datasets of publicly available WGS data were accessed through the NCBI Sequence Read Archive under the following BioProjects: PRJNA869081 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA869081) PRJNA480551 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA480551) PRJNA1017431 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1017431) PRJNA1241325 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1241325) PRJNA1174053 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1174053) PRJNA1293457 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1293457) PRJNA843031 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA843031) Whole-genome sequences were included in the validation analysis if they consisted of paired-end data (Illumina) and achieved [≥]95% genome completeness following trimming of the 5' and 3' untranslated regions (UTRs). This criterion ensured sufficient genome coverage for robust validation while allowing for limited missing data arising from regions of low sequencing depth or amplicon dropout. A complete list of sequences included in the validation, along with their accession numbers, is provided in Supplementary Table 1.
Del Vecchio, A.; Enoka, R. M.
Show abstract
The scientific literature on human motor units and electromyography (EMG) spans over a century (1925-2025), comprising research impossible to synthesize manually. We introduce NeuromechaniX, a domain-specific platform for automated extraction and meta-analysis of this literature. The core component, MUscraper, is a large language model pipeline that extracts approximately 200 structured metadata fields, organized into 17 major sections spanning participant demographics, EMG acquisition parameters, muscle identification, task protocols, decomposition methods, and motor-unit outcomes, from [~]2,000 publications on human limb muscles. This automated extraction transforms heterogeneous narrative reports into a standardized, queryable database at a scale not achievable through manual review. From this dataset, we analyzed motor-unit discharge rate across 208 studies examining seven muscles. Our analyses reveal that discharge rates differ significantly among muscles (p<0.001), with biceps brachii exhibiting the highest rates (15.9 pps), followed by first dorsal interosseous (13.7 pps) and tibialis anterior (13.5 pps), whereas gastrocnemius (11.3 pps), the vastii muscles (11.5 pps) and soleus show the lowest rates (9.9 pps). Sex-stratified analysis shows females exhibit higher discharge rates than males (14.5 vs 11.9 pps; Cohens d=0.38, p=0.018). In contrast, age-stratified analysis reveals non-significant differences between young and older adults (d=-0.24, p=0.072). Collectively, these results show that current views of human motor units are limited to a few muscles, with little data on females and older adults. The complete structured database is available through an open-access interactive platform (https://neuro-mechanix.com/metadata), enabling researchers to explore, filter, and download the extracted metadata. NeuromechaniX provides infrastructure for large-scale meta-research, identification of literature gaps, and hypothesis generation for the neuromechanics community.
Mishra, P.; Gandhi, T. K.; Gandhi, S. R.
Show abstract
Smartphones have become pervasive tools for communication, information consumption, and digital interaction, yet the neurophysiological dynamics associated with naturalistic smartphone use remain insufficiently characterized. Here, we present a multimodal physiological dataset collected during ecologically valid smartphone interaction and a subsequent standardized low-engagement baseline condition. Twenty-three participants engaged with their most frequently used smartphone application (primarily gaming or short form video) for ten minutes, followed by a five-minute passive viewing of a standardized nature video. Simultaneous recordings were obtained from electroencephalography (EEG; 64 channels), wearable eye-tracking, photoplethysmography (PPG), and galvanic skin response (GSR) sensors. Questionnaire-based assessments, including the smartphone addiction scale (SAS) and the mobile phone problematic use scale (MPPUS), are also collected to characterize individual differences in smartphone-related behavioral traits. All data streams are synchronized using transistor-transistor logic (TTL) trigger signals to ensure precise temporal alignment across modalities. The dataset is organized according to the Brain Imaging Data Structure (BIDS) specification and is publicly available on OpenNeuro (Accession Number: ds007537). This dataset enables the investigation of neural, ocular, and autonomic responses during smartphone interaction and supports multimodal analysis of diverse smartphone behaviors while preserving ecological validity.
Stowell, D.; Nolasco, I.; McEwen, B.; Vidana Vila, E.; Jean-Labadye, L.; Benhamadi, Y.; Lostanlen, V.; Dubus, G.; Hoffman, B.; Linhart, P.; Morandi, I.; Cazau, D.; White, E.; White, P.; Miller, B.; Nguyen Hong Duc, P.; Schall, E.; Parcerisas, C.; Gros-Martial, A.; Moummad, I.
Show abstract
Computational bioacoustics has seen significant advances in recent decades. However, the rate of insights from automated analysis of bioacoustic audio lags behind our rate of collecting the data - due to key capacity constraints in data annotation and bioacoustic algorithm development. Gaps in analysis methodology persist: not because they are intractable, but because of resource limitations in the bioacoustics community. To bridge these gaps, we advocate the open science method of data challenges, structured as public contests. We conducted a bioacoustics data challenge named BioDCASE, within the format of an existing event (DCASE). In this work we report on the procedures needed to select and then conduct useful bioacoustics data challenges. We consider aspects of task design such as dataset curation, annotation, and evaluation metrics. We report the three tasks included in BioDCASE 2025 and the resulting progress made. Based on this we make recommendations for open community initiatives in computational bioacoustics.
Demsar, J.; Kraljic, A.; Matkovic, A.; Brege, S.; Pan, L.; Tamayo, Z.; Fonteneau, C.; Helmer, M.; Ji, J. L.; Anticevic, A.; Korponay, C.; Salavrakos, M.; Glasser, M. F.; Nickerson, L. D.; Cho, Y. T.; Repovs, G.
Show abstract
Preprocessing and analysis of neuroimaging data are technically demanding, often requiring a combination of multiple software tools, modality-specific pipelines, and extensive parameter tuning to match dataset characteristics. These complexities make it difficult to document workflows in sufficient detail to ensure complete transparency and reproducibility. To address these challenges, we introduce QuNex recipes, a framework for defining and executing complete neuroimaging workflows - encompassing data onboarding, preprocessing, and analysis - in a transparent, machine- and human-readable format. Recipes are implemented as an integrated feature of the Quantitative Neuroimaging Environment & Toolbox (QuNex), a containerized, open-source platform for end-to-end multimodal and multi-species neuroimaging processing. The recipes framework enables seamless integration of QuNex commands with custom scripts and external tools, capturing every processing step and parameter setting. A fully reproducible study can thus be shared and replicated by providing only (a) the QuNex version used, (b) the recipe file, and (c) the data. This approach standardizes workflow specification, enhances transparency, and enables one-command replication of complex neuroimaging analyses. By providing a standardized way to describe and share workflows, recipes facilitate open exchange of best practices and reproducible methods within the neuroimaging community.
Nabizadeh, F.
Show abstract
Quantitative analysis of positron emission tomography (PET) neuroimaging data is essential for studying neurodegenerative diseases, yet existing processing pipelines often rely on computationally intensive software packages such as FreeSurfer, limiting accessibility for many research groups. Here I introduce BrainPET Studio, an open-source desktop application for atlas-based regional PET quantification that operates entirely in Montreal Neurological Institute (MNI) standard space. BrainPET Studio integrates affine registration, optional Muller-Gartner (MG) partial volume correction (PVC), interactive quality control (QC), and standardized uptake value ratio (SUVR) calculation into a single graphical user interface (GUI), eliminating the requirement for FreeSurfer-based cortical reconstruction. I validated BrainPET Studio against two established pipelines: (1) the UC Berkeley Alzheimers Disease Neuroimaging Initiative (ADNI) AV1451 (flortaucipir) pipeline, which employs FreeSurfer v7.1.1 parcellation, SPM-based coregistration, and Geometric Transfer Matrix (GTM) PVC in native subject space; and (2) the volBrain/petBrain online platform. Region-of-interest (ROI) SUVR values were compared across 322 subjects. Overall Pearson correlation coefficients for meta-ROI composites ranged from r = 0.83-0.96 versus ADNI and r = 0.86-0.94 versus volBrain/petBrain. Detailed per-subject validation on four representative cases across 112 FreeSurfer-defined regions demonstrated strong agreement for large cortical composites and acceptable variability for smaller medial temporal structures. These results establish BrainPET Studio as a reliable, accessible, and extensible tool for multi-site PET research, educational applications, and studies where FreeSurfer-based processing is impractical.